231 research outputs found
Probabilistic Model Counting with Short XORs
The idea of counting the number of satisfying truth assignments (models) of a
formula by adding random parity constraints can be traced back to the seminal
work of Valiant and Vazirani, showing that NP is as easy as detecting unique
solutions. While theoretically sound, the random parity constraints in that
construction have the following drawback: each constraint, on average, involves
half of all variables. As a result, the branching factor associated with
searching for models that also satisfy the parity constraints quickly gets out
of hand. In this work we prove that one can work with much shorter parity
constraints and still get rigorous mathematical guarantees, especially when the
number of models is large so that many constraints need to be added. Our work
is based on the realization that the essential feature for random systems of
parity constraints to be useful in probabilistic model counting is that the
geometry of their set of solutions resembles an error-correcting code.Comment: To appear in SAT 1
Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access
Pregel is a popular distributed computing model for dealing with large-scale
graphs. However, it can be tricky to implement graph algorithms correctly and
efficiently in Pregel's vertex-centric model, especially when the algorithm has
multiple computation stages, complicated data dependencies, or even
communication over dynamic internal data structures. Some domain-specific
languages (DSLs) have been proposed to provide more intuitive ways to implement
graph algorithms, but due to the lack of support for remote access --- reading
or writing attributes of other vertices through references --- they cannot
handle the above mentioned dynamic communication, causing a class of Pregel
algorithms with fast convergence impossible to implement.
To address this problem, we design and implement Palgol, a more declarative
and powerful DSL which supports remote access. In particular, programmers can
use a more declarative syntax called chain access to naturally specify dynamic
communication as if directly reading data on arbitrary remote vertices. By
analyzing the logic patterns of chain access, we provide a novel algorithm for
compiling Palgol programs to efficient Pregel code. We demonstrate the power of
Palgol by using it to implement several practical Pregel algorithms, and the
evaluation result shows that the efficiency of Palgol is comparable with that
of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape
Asynchronous Graph Pattern Matching on Multiprocessor Systems
Pattern matching on large graphs is the foundation for a variety of
application domains. Strict latency requirements and continuously increasing
graph sizes demand the usage of highly parallel in-memory graph processing
engines that need to consider non-uniform memory access (NUMA) and concurrency
issues to scale up on modern multiprocessor systems. To tackle these aspects,
graph partitioning becomes increasingly important. Hence, we present a
technique to process graph pattern matching on NUMA systems in this paper. As a
scalable pattern matching processing infrastructure, we leverage a
data-oriented architecture that preserves data locality and minimizes
concurrency-related bottlenecks on NUMA systems. We show in detail, how graph
pattern matching can be asynchronously processed on a multiprocessor system.Comment: 14 Pages, Extended version for ADBIS 201
Timing properties and correctness for structured parallel programs on x86-64 multicores
This paper determines correctness and timing properties for structured parallel programs on x86-64 multicores. Multicore architectures are increasingly common, but real architectures have unpredictable timing properties, and even correctness is not obvious above the relaxed-memory concurrency models that are enforced by commonly-used hardware. This paper takes a rigorous approach to correctness and timing properties, examining common locking protocols from first principles, and extending this through queues to structured parallel constructs. We prove functional correctness and derive simple timing models, and both extend for the first time from low-level primitives to high-level parallel patterns. Our derived high-level timing models for structured parallel programs allow us to accurately predict upper bounds on program execution times on x86-64 multicores.Postprin
On the Usability of Probably Approximately Correct Implication Bases
We revisit the notion of probably approximately correct implication bases
from the literature and present a first formulation in the language of formal
concept analysis, with the goal to investigate whether such bases represent a
suitable substitute for exact implication bases in practical use-cases. To this
end, we quantitatively examine the behavior of probably approximately correct
implication bases on artificial and real-world data sets and compare their
precision and recall with respect to their corresponding exact implication
bases. Using a small example, we also provide qualitative insight that
implications from probably approximately correct bases can still represent
meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph
On the Computational Complexity of MapReduce
In this paper we study MapReduce computations from a complexity-theoretic
perspective. First, we formulate a uniform version of the MRC model of Karloff
et al. (2010). We then show that the class of regular languages, and moreover
all of sublogarithmic space, lies in constant round MRC. This result also
applies to the MPC model of Andoni et al. (2014). In addition, we prove that,
conditioned on a variant of the Exponential Time Hypothesis, there are strict
hierarchies within MRC so that increasing the number of rounds or the amount of
time per processor increases the power of MRC. To the best of our knowledge we
are the first to approach the MapReduce model with complexity-theoretic
techniques, and our work lays the foundation for further analysis relating
MapReduce to established complexity classes
The Range of Topological Effects on Communication
We continue the study of communication cost of computing functions when
inputs are distributed among processors, each of which is located at one
vertex of a network/graph called a terminal. Every other node of the network
also has a processor, with no input. The communication is point-to-point and
the cost is the total number of bits exchanged by the protocol, in the worst
case, on all edges.
Chattopadhyay, Radhakrishnan and Rudra (FOCS'14) recently initiated a study
of the effect of topology of the network on the total communication cost using
tools from embeddings. Their techniques provided tight bounds for simple
functions like Element-Distinctness (ED), which depend on the 1-median of the
graph. This work addresses two other kinds of natural functions. We show that
for a large class of natural functions like Set-Disjointness the communication
cost is essentially times the cost of the optimal Steiner tree connecting
the terminals. Further, we show for natural composed functions like and , the naive protocols
suggested by their definition is optimal for general networks. Interestingly,
the bounds for these functions depend on more involved topological parameters
that are a combination of Steiner tree and 1-median costs.
To obtain our results, we use some new tools in addition to ones used in
Chattopadhyay et. al. These include (i) viewing the communication constraints
via a linear program; (ii) using tools from the theory of tree embeddings to
prove topology sensitive direct sum results that handle the case of composed
functions and (iii) representing the communication constraints of certain
problems as a family of collection of multiway cuts, where each multiway cut
simulates the hardness of computing the function on the star topology
Stochastic tasks: difficulty and Levin search
We establish a setting for asynchronous stochastic tasks that
account for episodes, rewards and responses, and, most especially, the
computational complexity of the algorithm behind an agent solving a
task. This is used to determine the difficulty of a task as the (logarithm
of the) number of computational steps required to acquire an acceptable
policy for the task, which includes the exploration of policies and their
verification. We also analyse instance difficulty, task compositions and
decompositions.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). Stochastic tasks: difficulty and Levin search. En Artificial General Intelligence. Springer International Publishing. 90-100. http://hdl.handle.net/10251/66686S9010
Learning Ordinal Preferences on Multiattribute Domains: the Case of CP-nets
International audienceA recurrent issue in decision making is to extract a preference structure by observing the user's behavior in different situations. In this paper, we investigate the problem of learning ordinal preference orderings over discrete multi-attribute, or combinatorial, domains. Specifically, we focus on the learnability issue of conditional preference networks, or CP- nets, that have recently emerged as a popular graphical language for representing ordinal preferences in a concise and intuitive manner. This paper provides results in both passive and active learning. In the passive setting, the learner aims at finding a CP-net compatible with a supplied set of examples, while in the active setting the learner searches for the cheapest interaction policy with the user for acquiring the target CP-net
Risk-Averse Matchings over Uncertain Graph Databases
A large number of applications such as querying sensor networks, and
analyzing protein-protein interaction (PPI) networks, rely on mining uncertain
graph and hypergraph databases. In this work we study the following problem:
given an uncertain, weighted (hyper)graph, how can we efficiently find a
(hyper)matching with high expected reward, and low risk?
This problem naturally arises in the context of several important
applications, such as online dating, kidney exchanges, and team formation. We
introduce a novel formulation for finding matchings with maximum expected
reward and bounded risk under a general model of uncertain weighted
(hyper)graphs that we introduce in this work. Our model generalizes
probabilistic models used in prior work, and captures both continuous and
discrete probability distributions, thus allowing to handle privacy related
applications that inject appropriately distributed noise to (hyper)edge
weights. Given that our optimization problem is NP-hard, we turn our attention
to designing efficient approximation algorithms. For the case of uncertain
weighted graphs, we provide a -approximation algorithm, and a
-approximation algorithm with near optimal run time. For the case
of uncertain weighted hypergraphs, we provide a
-approximation algorithm, where is the rank of the
hypergraph (i.e., any hyperedge includes at most nodes), that runs in
almost (modulo log factors) linear time.
We complement our theoretical results by testing our approximation algorithms
on a wide variety of synthetic experiments, where we observe in a controlled
setting interesting findings on the trade-off between reward, and risk. We also
provide an application of our formulation for providing recommendations of
teams that are likely to collaborate, and have high impact.Comment: 25 page
- …